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"VIDEO CODING METHOD AND DEVICE" 


5 FIELD OF THE INVENTION 

The present invention relates to a three-dimensional (3D) video coding method for 
the compression of a bitstream corresponding to an original video sequence that has been 
divided into successive groups of frames (GOFs) the size of which is N = 2 n with n = 0, 1, 2,..., 
said coding method comprising the following steps, applied to each successive GOF of the 
10 sequence : 

a) a spatio-temporal analysis step, leading to a spatio-temporal multiresolutlon 
decomposition of the current GOF into 2" low and high frequency temporal subbands, said step 
itself comprising : 

- a motion estimation sub-step ; 

15 - based on said motion estimation, a motion compensated temporal filtering sub- 

step, performed on each of the 2"" 1 couples of frames of the current GOF ; 

- a spatial analysis sub-step, performed on the subbands resulting from said 

- temporal Uttering sab=Step ; : 

b) an encoding step, performed on said low and high frequency temporal subbands 
20 resulting from the spatio-temporal analysis step and on motion vectors obtained by means of 

said motion estimation step and delivering an embedded coded bitstream. 

The Invention also relates to a video coding device for carrying out said coding 

method. 


25 BACKGROUND OF THE INVENTION 

Video streaming over heterogeneous networks requires a high 
scalability capability. That means that parts of a bitstream can be decoded without a 
complete decoding of the sequence and can be combined to reconstruct the initial 
video informiation at lower spatial or temporal resolutions (spatial/temporal 

30 scalability) or with a lower quality (PSNR or bitrate scalability). A convenient way to 

achieve all these three types of scalability (scalable, temporal, PSNR) is a three- 
dimensional (3D, or 2D + 1) subband decomposition of the input video sequence, 
after a motion compensation of said sequence. 

Current standards like MPEG-4 have implemented limited scalability in a predictive 

35 DCT-based framework through additional high-cost layers. More efficient solutions based on a 

three-dimensional subband decomposition followed by a hierarchical encoding of the spatio- 
temporal trees - performed by means of an encoding module based on the technique named 
Fully Scalable Zerotree (FSZ) - have been recently proposed as an extension of still Image 
coding techniques for video : the 3D or (2D+t) subband decomposition provides a natural 
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spatial resolution and frame rate scalability, while the in-depth scanning of the coefficients in 
the hierarchical trees and the progressive bitpiane encoding technique lead to the desired 
quality scalability- A higher flexibility is then obtained at a reasonable cost in terms of coding 
efficiency. 

The ISO/IEC MPEG normalization committee launched at the 58 th 
Meeting in Pattaya, Thailand, December 3-7, 2001, a dedicate AdHoc Group (AHG on 
Exploration of Interframe Wavelet Technology in Video Coding) in order to, among 
other things, explore technical approaches for interframe (e.g. motion-compensated) 
wavelet coding and analyze in terms of maturity, efficiency and potential for future 
optimization. The codec described in the document PCT/EP01/04361 (PHFR000044) 
is based on such an approach, illustrated in Fig.l that shows a temporal subband 
decomposition with motion compensation. This 3D wavelet decomposition with 
motion compensation is applied to a group of frames (GOF), these frames being 
referenced Fl to F8 and organized in successive couples of frames. Each GOF is 
motion-compensated (MC) and temporally filtered (TF), thanks to a Motion 
Compensated Temporal Filtering (MCTF) module. At each temporal decomposition 
level, resulting low frequency temporal subbands are further filtered and the process 

— stops-w hen the r e Is only one lempuidl lowirequency sabbaneHefir(the root temporal 
subband called LLL in Fig.l where three stages of decomposition are shown : L and 
H = first stage ; LL and LH = second stage ; LLL and LLH = third stage), 
representing a temporal approximation of the Input GOF. Also at each decomposition 
level, a group of motion vector fields is generated (in Fig.l, MV4 at the first level, 
MV3 at the second one, MV2 at the third one). After these two operations performed 
in the MCTF module, the frames of the temporal subbands thus obtained are further 
spatially decomposed and yield a spatio-temporal tree of subband coefficients. 

With Haar filters used for the temporal filtering operations, motion 
estimation (ME) and motion compensation (MC) are only performed every two 
frames of the input sequence, the total number of ME/MC operations required for 

theaA^oleJ^mporalJjge^ a predictive scheme . Usi ng 

these very simple filters, the low frequency temporal subband represents a temporal 
average of the input couple of frames, whereas the high frequency one contains the 
residual error after the MCTF operation. 

One of the main parameters that has been identified as being relevant 
for the MCTF module of a motion compensated 3D subband video coding scheme is 
the so-called W ME Activation" (or motion estimation activation), in other words the 
decision to perform or not ME on a couple of input frames (for the first temporal 
level) or subbands (for the following levels). For high motion activity sequence, it has 
been observed that using ME and therefore performing temporal filtering along 
motion trajectories do increase the overall coding efficiency. However, this gain in 
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coding efficiency may be lost in case of decoding at low bit-rate (one must keep in 
mind that the decoding bit-rate is a priori unknown in the framework of scalable 
coding), due to a too possible high overhead for motion vectors. So it may be more 
efficient in certain circumstances to decide not to activate ME so as to keep as much 
5 as possible bit-rate for texture coding (and decoding). 

SUMMARY OF THE INVENTION 

It is therefore an object of the Invention to propose an encoding method avoiding 
the conventional solutions encountered in current MC 3D subband video coding schemes, in 
which ME Activation within a MCTF module Is either arbitrarily chosen or derived from some 
10 information obtained a posteriori. I.e. only after having actually performed MCTF. 

To this end, the invention relates to a coding method such as defined in the 
introductory paragraph of the description and which Is moreover characterized in that said 
spatio-temporal analysis step also comprises a decision sub-step for activating or not the motion 
estimation sub-step, said decision sub-step itself comprising a motion activity pre-analysis 
15 operation based on the MPEG-7 Motion Activity descriptors and performed on the input frames 

or subbands to be motion compensated and temporally filtered. 

According to a particularly advantageous implementBtionrsaichmethod is 

characterized in that said decision sub-step, based on the Intensity of activity 'attribute of the 
MPEG-7 Motion Activity Descriptors for all the frames or subbands of the current temporal 
20 decomposition level, comprises the following operations : 

1) for a specific temporal decomposition level : 

a) perform ME between each couple of frames (or subbands) that 
compose this level : 

- for each couple : 

25 - compute the standard deviation of motion vector magnitude ; 

- compute the Activity value. 

b) compute the average Activity Intensity I(av) : 

- if I(av) is equal to 5 (value corresponding to "very high intensity"), it 
is decided to deactivate ME for respectively the current temporal decomposition level 

30 and the following levels as well ; 

- if I(av) is strictly below 5, it is decided to activate ME for the current 
temporal decomposition level. 

2) go to the next temporal decomposition level. 

Since the ME deactivation for a specific level results in the ME deactivation for the 
35 following levels, this technical solution leads to a significant complexity reduction of the overall 

MCTF module, while still offering a good compression efficiency and above all a good 
compromise between motion vector overhead and picture quality. 
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It is another object of the invention to propose a coding device for carrying out 
such a coding method. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The present invention will now be described, by way of example, with 
reference to the accompanying drawings in which : 

Fig.l illustrates the conventional case of die temporal subband 
decomposition of an input video sequence with motion compensation ; 

Fig.2 illustrates the case in which, according to the invention, ME is 
activated for only the first temporal decomposition level and deactivated for the following 
levels. 

DETAILED DESCRIPTION OF THE INVENTION 

As seen above, the whole efficiency of any NIC 3D subband video coding 
scheme depends on the specific efficiency of its MCTF module in compacting the 
temporal energy of the input GOF. As the parameter "ME Activation" is now known 
to be a major one for the success of MCTF, it is proposed, according to the 
invention, to d erive-tiiis-parameter-from-a-dynami cal Motion Activity pre-anaiysis-of- 


the Input frames (or subbands) to be motion-compensated temporally filtered, using 
normative (MPEG-7) motion descriptors (see the document "Overview of the MPEG-7 
Standard, version 6.0", ISO/IEC JTC1/SC29/WG11 N4509, Pattaya, Thailand, 
December 2001, pp.1-93). The following description will define which descriptor is 
used and hbW it Influences the choice of the above-mentioned encoding parameter. 

In the 3D video coding scheme described above, ME/MC is generally 
arbitrarily performed on each couple of frames (or subbands) of the current temporal 
decomposition level. It is now proposed to either activate or deactivate ME according 
to the * Intensity of activity" attribute of the MPEG-7 Motion Activity Descriptors, and 
this for all the frames - or subbands - of the current temporal decomposition level 

r/^o/^cftto^takesJtsjnteger values within tae_DL^5] range ; for instance 1 — 
means a "very low intensity" and 5 means "very high intensity"). This Activity 
Intensity attribute is obtained by performing ME as it would be done anyway in a 
conventional MCTF scheme and using statistical properties of the motion-vector 
magnitude thus obtained. Quantized standard deviation of motion-vector magnitude 
is a good metric for the motion Activity Intensity, and Intensity value can be derived 
from the standard deviation using thresholds. The ME Activation will therefore be 
obtained as now described : 
2) for a specific temporal decomposition level : 

a) perform ME between each couple of frames (or subbands) that 
compose this level : 
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- for each couple : 

- compute the standard deviation of motion vector magnitude ; 

- compute the Activity value. 

b) compute the average Activity Intensity I(av) : 

- if I(av) is equal to 5 (value corresponding to "very high intensity"), it 
is decided to deactivate ME for respectively the current temporal decomposition level 
and the following levels as well ; 

- if I(av) is strictly below 5, it is decided to activate ME for the current 
temporal decomposition level. 

3) go to the next temporal decomposition level. 
If ME is activated for a specific level, based on such a pre-analysis, motion vectors 
are already computed and can be directly used for MCTF of that level. On the 
contrary, if ME is deactivated, the motion vectors pre-computed for the needs of the 
pre-analysis are then useless and can be discarded. Moreover, the ME deactivation 
for a specific level results in the ME deactivation for the following levels, which leads 
to a reduction of complexity of the overall MCTF module, as illustrated for example 
in Rg.2 corresponding to the case in which ME is only activated for the first temporal 
-deeemposition-level-and-deacavated-fortiie fo l lowin g ones . 
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CLAIMS : 


1. 


A three-dimensional (3D) video coding method for the compression of a bitstream 


25 


30 


corresponding to an original video sequence that has been divided into successive groups of 
frames (GOFs) the size of which is N = 2 n with n - 0, 1, 2,..., said coding method comprising 
the following steps, applied to each successive GOF of the sequence : 

a) a spatio-temporal analysis step, leading to a spatio-temporal multiresolution 
decomposition of the current GOF into 2 n low and high frequency temporal subbands, said step 
itself comprising : 

- a motion estimation sub-step ; 

- based on said motion estimation, a motion compensated temporal filtering sub- 
step, performed on each of the 2 rl " 1 couples of frames of the current GOF ; 

- a spatial analysis sub-step, performed on the subbands resulting from said 
filtering sub-step ; 

b) an encoding step, performed on said low and high frequency temporal subbands 
resulting from the spatio-temporal analysis step and on motion vectors obtained by means of 
said motion estimation step and delivering an embedded coded bitstream ; 

said coding method being further characterized in that said spatio-temporal analysis step also 
comprises a decision sub-step f or act ivating^rnotrthemoti on estimation sub-step, said decisio rr" 
sub-step itself comprising a motion activity pre-analysis operation based on the MPEG-7 Motion 
Activity descriptors and performed on the input frames or subbands to be motion compensated 
and temporally filtered. 

2. A coding method according to claim 1, said decision sub-step being based on the 

Intensity of activity attribute of the MPEG-7 Motion Activity Descriptors for all the frames or 
subbands of the current temporal decomposition level and comprising the following operations : . 

1) for a specific temporal decomposition level : 

a) perform ME between each couple of frames (or subbands) that 
compose this level : 

- for each couple : 

- compute the standard deviation of motion vector magnitude ; 

- compute the Activity value. 

b) compute the average Activity Intensity I(av) : 

- if I(av) is equal to 5 (value corresponding to "very high intensity"), it 
is decided to deactivate ME for respectively the current temporal decomposition level 
and the following levels as well ; 

- if I(av) is strictly below 5, it is decided to activate ME for the current 
temporal decomposition level. 

2) go to the next temporal decomposition level. 
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3. A video coding device for the compression of a bitstream corresponding to an 

original video sequence that has been divided into successive groups of frames (GOFs) the size 
of which is N = 2 n with n * 0, 1, 2,..., said coding device comprising the following elements : 

a) spatio-temporal analysis means, applied to each successive GOF of the sequence and 
leading to a spatio-temporal multiresolution decomposition of the current GOF Into 2 n low and 
high frequency temporal subbands, said analysis means themselves comprising : 

- a motion estimation circuit ; 

- based on the result of said motion estimation, a motion compensated temporal 
filtering circuit, applied to each of the 2"* couples of frames of the current GOF ; 

- a spatial analysis circuit, applied to tiie subbands delivered by said temporal 
filtering circuit ; 

b) encoding means, applied to the low and high frequency temporal subbands delivered 
by said spatio-temporal analysis means and to motion vectors delivered by said motion 
estimation circuit, said encoding means delivering an embedded coded bitstream ; 

said coding device being further characterized In that said spatio-temporal analysis means also 
comprise a decision circuit for activating or not the motion estimation circuit, said decision 
circuit itself comprising a motion activity pre-analysis stage, using the MPEG-7 Motion Activity 
-descriptors-and-a pplied to 11 l e input frames oi sub bandsirrbe motlonrcompensdled and 


temporally filtered. 
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Abstract 

The invention relates to a three-dimensional (3D) video coding method for the 
compression of a coded bitstrearn corresponding to an original video sequence that has been 
divided into successive groups of frames (GOFs). This method, applied to each GOF of the 
sequence, comprises (a) a spatio-temporal analysis step, leading to a spatio-temporal 
multiresolution decomposition of the current GOF into low and high frequency temporal 
subbands and itself comprising a motion estimation sub-step, a motion compensated temporal 
filtering sub-step, and a spatial analysis sub-step ; (b) an encoding step, performed on said low 
and high frequency temporal subbands and on motion vectors obtained by means of said 
motion estimation step. According to the invention, said spatio-temporal analysis step also 
comprises a decision sub-step for activating or not the motion estimation sub-step, said decision 
sub-step itself comprising a motion activity pre-analysis operation based on the MPEG-7 Motion 
Activity descriptors and performed on the input frames or subbands to be motion compensated 
and temporally filtered. 




